Essential role of Cp190 in physical and regulatory boundary formation

Boundaries in animal genomes delimit contact domains with enhanced internal contact frequencies and have debated functions in limiting regulatory cross-talk between domains and guiding enhancers to target promoters. Most mammalian boundaries form by stalling of chromosomal loop-extruding cohesin by CTCF, but most Drosophila boundaries form CTCF independently. However, how CTCF-independent boundaries form and function remains largely unexplored. Here, we assess genome folding and developmental gene expression in fly embryos lacking the ubiquitous boundary-associated factor Cp190. We find that sequence-specific DNA binding proteins such as CTCF and Su(Hw) directly interact with and recruit Cp190 to form most promoter-distal boundaries. Cp190 is essential for early development and prevents regulatory cross-talk between specific gene loci that pattern the embryo. Cp190 was, in contrast, dispensable for long-range enhancer-promoter communication at tested loci. Cp190 is thus currently the major player in fly boundary formation and function, revealing that diverse mechanisms evolved to partition genomes into independent regulatory domains.


The PDF file includes:
Figs. S1 to S8 Tables S1 to S4 Legends for data S1 to S15

Other Supplementary Material for this manuscript includes the following:
Data S1 to S15  n=63  n=70  n=45  n=358  n=79  n=188  n=7  n=28  n=255  n=47  n=68  n=45  n=31  n=338  n=95  n=207  n=7 n=22 n=265 n=62 A. Western blotting of whole cell extracts from 6-10 hour old embryos of indicated genotypes. Blotting membranes were cut at the 70 kDa marker height. The upper part was probed first with anti-CTCF and then with anti-Cp190, while the lower part was probed with anti-alpha-tubulin to verify similar loading of each extract. Each picture delimited by a black border is of the same membrane. G. Scatter plot of physical insulation score differences measured in Cp190 0 minus WT Hi-C maps (in y) at all 1140 Cp190-occupied boundaries (points) versus distance (in bp, in log(x+1) transformed x axis) from the position of the most boundary-proximal Cp190 peak to the center of the nearest indicated motif. Motifs enriched in non-promoter boundaries are in the first column; motifs enriched in promoter boundaries are in the second column. Box plots of indicated n Cp190 peaks binned by distance to motif are overlaid. This shows that insulation defects in Cp190 0 are higher closer to motifs enriched in non-promoter boundaries (CTCF, Ibf1, Su(Hw)) than to motifs enriched at promoter boundaries (BEAF-32, M1BP, core motif-6, ZIPIC). These scatter plots accompany the box plots shown in Fig. 1D.
H. Scatter plot of physical insulation score differences measured in Cp190 0 minus WT Hi-C maps (in y) versus Cp190 ChIP occupancy (in x) for each WT Cp190 ChIP peak (points). Insulation scores are measured at the nearest bin boundary to the Cp190 peak position. Box plots of indicated n Cp190 peaks binned by ChIP occupancy are overlaid. This shows that insulation defects in Cp190 0 scale with Cp190 ChIP occupancy, except at rare very high occupancy Cp190 ChIP peaks.     B. Scatter plot of physical insulation score differences measured in CTCF 0 minus WT Hi-C maps (in y) versus CTCF ChIP occupancy (in x) for each CTCF peak (points). Insulation scores are measured at the nearest bin boundary to the CTCF peak position. Box plots of indicated n CTCF peaks binned by ChIP occupancy are overlaid. Box plot center line is median; box limits are upper and lower quartiles; whiskers are 1.5x interquartile ranges. This shows that insulation defects in CTCF 0 are observed at low and high occupancy CTCF ChIP peaks, but not at intermediate occupancy CTCF ChIP peaks (which are not at boundaries -see Fig. 2A).
C. Numbers of CTCF peaks in WT (n=1477 peaks) that overlap a Cp190 peak in WT or not (rows), and have a contact domain boundary in WT within ±2 kb of the peak position or not (columns). Cells are colored by log10(observed/nexpected), where nexpected is the expected value assuming independence of rows and columns (see Methods). CTCF+Cp190 colocalization is significantly positively associated with localization at a boundary (odds ratio and p-value are from two-sided Fisher's Exact Test for Count Data).
D. Numbers of CTCF peaks in WT (n=1477 peaks) that overlap a Cp190 peak in WT or not (rows), and whose peak position is inside an intron or not (columns). Cells are colored by log10(nobserved/nexpected), where nexpected is the expected value assuming independence of rows and columns (see Methods). CTCF co-localization with Cp190 is significantly negatively associated with localization in an intron (odds ratio and p-value are from two-sided Fisher's Exact Test for Count Data

A B
At all 312 CTCF-occupied boundaries: A. Scatter plot of physical insulation score differences measured by Hi-C in CTCF 0 (top), Cp190 0 (middle) or double 0 (bottom) minus WT (in y) at all 312 boundaries occupied by CTCF in WT (i.e. all boundaries within ±2 kb of a CTCF ChIP peak in WT, points) versus distance (in bp, in log(x+1) transformed x axis) from the position of the most boundaryproximal CTCF peak to the nearest Cp190 peak in CTCF 0 . Box plots of indicated n CTCF peaks binned by distance to the Cp190 peak are overlaid. Box plot center line is median; box limits are upper and lower quartiles; whiskers are 1.5x interquartile ranges. This shows that boundary defects in CTCF 0 are smaller when the boundary is close to a residual Cp190 peak, which is not the case in Cp190 0 or double 0 . These scatter plots accompany the box plots shown in Fig. 3C.
B. Same as A but versus distance from the position of the most boundary-proximal CTCF peak to the nearest transcribed TSS (RPKM>0) in WT 4-6 hour old embryos [expression data from Graveley et al. (75)]. This shows that insulation defects in all genotypes are smaller at boundaries close to transcribed TSSs. These scatter plots accompany the box plots shown in Fig. 3D.

C. Example locus (dm6 coordinates) Hi-C maps, physical insulation score (calculated with different window sizes in gray, average in black) and contact domain boundaries (vertical red lines) from this study (above) and published Hi-C studies in embryos [Hug et al. (67)] and tissue culture cells [Ramírez et al. (8)] (below), Cp190
ChIP-seq with Cp190 peaks defined in the respective genotype relative to Cp190 0 highlighted in light blue, CTCF ChIP-seq with CTCF peaks defined in the respective genotype relative to CTCF 0 highlighted in dark blue and numbered 1 to 4, and gene tracks (only longest isoform of each protein-coding gene shown) in embryos of the indicated genotypes. ChIP-seq scale is reads per million. Differential Hi-C maps (mutants minus WT in row 2, and CTCF 0 minus Cp190 0 in row 3), physical insulation score and contact domain boundaries are shown below. A non-specific CTCF ChIP-seq signal detected in CTCF 0 is marked by a black asterisk (the signal was higher in WT and thus called a CTCF peak, but it was not sufficiently high in Cp190 0 to be called a CTCF peak in the differential analysis). Arrowheads point to CTCF+Cp190 co-bound peaks located at domain boundaries in WT, while empty arrowheads indicate sites where these peaks are absent in the indicated genotypes. Cp190 co-localizing with CTCF peak 1 is partially CTCF-dependent because its peak height is reduced but the peak is still detected in CTCF 0 . Cp190 co-localizing with CTCF peaks 2-4 are strictly CTCF-dependent because these peaks are lost in CTCF 0 . This locus illustrates that contact domain boundaries at partially CTCF-dependent Cp190 peaks (peak 1) are more strongly affected in Cp190 0 than in CTCF 0 , whereas the strictly CTCFdependent Cp190 peaks are more strongly affected in CTCF 0 than in Cp190 0 .
E. GFP pull-down of tagged full-length Ibf1 (left) or Ibf2 (right), each co-expressed with untagged Cp190 and Cp60. GFP pull-down in the absence of Ibf1 or Ibf2 is shown as negative control. Positions of molecular weight ladder bands are marked on the left, those of co-expressed proteins are marked on the right.     A. Same locus shown in Fig. 5A additionally showing eigenvector values (2 kb resolution, positive for A compartment, negative for B compartment), and CTCF ChIP-seq with CTCF peaks defined in the respective genotype relative to CTCF 0 in dark blue, in embryos of the indicated genotypes. Differential Hi-C maps, physical insulation scores and contact domain boundaries in the respective mutants minus WT are shown below. Note limitations in TopDom boundary calls at this locus: the ftz upstream boundary was called but is shifted to the left of its visible position in the Hi-C map, and the ftz downstream boundary was not called robustly by TopDom and was hence subsequently filtered out in our analysis (see Methods). ftz boundaries are therefore visible by Hi-C but were not called because they challenge the classical definition of boundaries due to the unusually strong inter-domain interactions occurring across them at this exceptional locus. , consistent with the possibility that it may be a distal Scr enhancer in older embryos. Enhancer 6 is silent in young embryos, further suggesting that it is unlikely to be responsible for early Scr expression. Pictures of reporter gene expression driven by these enhancers were obtained from https://enhancers.starklab.org/.

B. Same locus shown in
C. Like Fig. 5C. RNA-FISH with co-hybridized antisense probes against Scr (red) and ftz (green) mRNAs in early gastrula CTCF 0 embryos stained with DAPI (blue) and imaged from the side (anterior left, posterior right, scale bars 100 µm). Single Scr and ftz images are shown on the left, and merged images of the same embryo is shown on the right. This shows that Scr and ftz appear normally expressed in CTCF 0 embryos.
D. RNA-FISH with antisense probes (red) against Scr (left) or ftz mRNAs (right) in stage 14 (mid-embryogenesis) embryos stained with DAPI (anterior left, posterior right, scale bars 100 µm). In WT embryos, Scr is expressed in anterior (labial and prothoracic) segments and the anterior midgut. Scr is expressed normally in all genotypes (rows) except Cp190 0 mutants which misexpress Scr in the hindgut and anal plate (arrowheads). Misexpression in anal plate becomes stronger in older embryos (see Fig. 6C). In WT embryos, ftz is expressed in a subset of cells of the ventral nerve cord. ftz is expressed normally in all genotypes (white asterisk marks background staining).  A. Bithorax-complex locus (dm6 coordinates) Hi-C maps (2 kb resolution), eigenvector values (2 kb resolution, positive for A compartment, negative for B compartment), physical insulation score (calculated with different window sizes in gray, average in black) and contact domain boundaries (vertical red lines) from this study ( B. Ventral nerve cords dissected from stage 15 embryos (at mid-embryogenesis) (oriented with anterior up) of the indicated genotypes subjected to RNA-FISH with probes against the indicated abdominal HOX gene mRNAs (red), followed by immunostaining with anti-Engrailed (green) to mark parasegment boundaries and DAPI-labeling (blue) of DNA. Parasegments (PS) are labeled. Phenotypes consistently seen in Cp190 0 mutants are marked by arrowheads: the empty arrowhead shows lower Ubx mRNA levels in PS6 in Cp190 0 than in WT, and solid arrowheads show higher abd-A mRNA levels in PS13 and PS14 in Cp190 0 than in WT. Scale bars below each nerve cord show 50 µm.

Figure S8
Hi-C in WT insulation score in WT boundaries in WT eve locus (dm6 coordinates) Hi-C maps (2 kb resolution), eigenvector values (2 kb resolution, positive for A compartment, negative for B compartment), physical insulation score (calculated with different window sizes in gray, average in black) and contact domain boundaries (vertical red lines) from this study (above) and published Hi-C studies in WT 3-4 hour old embryos [Hug et al. (67)] and tissue culture cells [Ramírez et al. (8)] (below), Cp190 ChIP-seq (in reads per million), Cp190 peaks defined as enriched in WT relative to Cp190 0 in light blue, CTCF ChIP-seq, CTCF peaks defined as enriched in WT relative to CTCF 0 in dark blue, characterized Nhomie and Homie insulators [Fujioka et al. (37)], and gene tracks (only longest isoform of each protein-coding gene shown, homeobox genes are blue). This shows that Homie overlaps a Cp190-occupied boundary in WT that is not visibly affected in Cp190 0 mutants.